home *** CD-ROM | disk | FTP | other *** search
-
- OK, now you're asking for it. I've been mulling this
- stuff over in my head for a couple weeks, and I've got some
- pretty good ideas as to how it all fits together.
-
- My model of global hypermedia includes the following terms:
-
- Entity -- SGML and MIME use this term. WAIS calls it a document.
- Gopher calls it an item or a textfile or something.
- WWW used to call it a document, and now calls it
- a resource.
-
- The meaning is the same in all of them: a unit
- of retrieval [from the URL document].
-
- Content-Type -- MIME coined this term. SGML calls it a NOTATION.
- WAIS used to call it :type, but they'll call
- it :content-type if they follow up on what they
- told me. Most gopher types fall under this scheme
- (telnet, cso, and other types that don't use gopher
- protocol don't fit)
-
- Reference -- This is the WWW anchor, the Gopher Menu item, the WAIS
- :document-id structure, The MIME message/external-body. It is
- enough information to 1) decide whether to retrieve the entity,
- 2) perform the retrieval transaction, and 3) process the entity
- once you've got it.
-
- >Really, though, the gopher reference is (in gopherspeak)
- >
- >Name=An arbitrary, but meaningful name
- >Host=gopher.micro.umn.edu
- >Port=70
- >Type=0
- >Path=Some Stuff
-
- NOTE: Some Stuff is terminated by a newline, and may not contain tabs.
-
- >And the "href=" is just a way to squash it down to a single string.
- >It could just as well be a set of attributes and not a single one.
- >E.g.
- >
- ><a gopherhost="gopher.micro.umn.edu"
- > gopherport="70"
- > gopherpath="/Some Stuff"
- > gophertype="0">
- >An arbitrary, but meaningful, name</a>
-
- NOTE: for type 7 items, you need gophersearch="terms" too.
-
- >expresses the meaning of what's going on in a way that's far closer to
- >how SGML might do it as far as I have been able to make out...Dan is
- >that actually legal SGML?
-
- Sure, that's legal. I suggested that URLs be expressed in SGML a long
- time ago. Tim said it was overkill, and I'm starting to agree.
-
- Let's take a closer look at references:
-
- 1) What features allow users and clients to decide to retrieve an entity:
-
- WWW context and content of the anchor (Is it relevant?)
-
- MIME content-id (do I have this entity cached already?)
- content-description (relevant?)
- content-type (can I process it once I've got it?)
- SIZE (is it too big to bother?)
-
- WAIS :score (relavent to my query?)
- :headline (relevant?)
- :doc-id (in cache?)
- original/distributor-server,database,local-id particularly useful
- :number-of-lines, :number-of-bytes (too big?)
- :type, :content-type (can I process it?)
- :date (how old is it?)
-
- Gopher name (is it interesting?)
- type (can I process it?)
-
- 2) What features allow the client to make the transfer?
-
- WWW URL -- protocol, host, port, path, type, size, search terms
- handles local files, HTTP, gopher, WAIS connections.
- includes search terms for fulltext indexes.
- scheme mechanism allows gateways to new protocols
-
- MIME access-type, etc.: handles ftp, anon-ftp, local-file
- Ghost body allows arbitrary extra data.
-
- Gopher host, port, path, search words
-
- WAIS source (host, port, database), doc-id, search terms,
- relavent documents (these are the novel feature. Quite handy)
-
- 3) What features allow the client to process the entity?
- (Keep in mind that these are features of the reference -- this
- is information we have _before_ we transfer the entity).
-
- WWW processing is tied to the protocol. Content-Type
- of local files is inferred from file extensions.
-
- Entities from HTTP connections are assumed to
- be text/x-html.
-
- Gopher entites are typed: 0=text/plain, 1=application/x-gopher,
- w=text/x-html.
-
- WAIS entites are typed: TEXT=text/plain, WSRC=application/x-wais.
-
- MIME content-type mechanism is quite expressive. Any content-type
- can be encapsulated in a message/rfc822 entity. Multiple
- entities can be encapsulated in a multipart/mixed entity.
-
- Gopher gopher type tells you what to do with the data.
- text/plain, application/x-gopher are universally supported.
- other types are supported by pilot projects.
-
- WAIS :type tells what to do. text/plain and application/x-wsrc are
- supported. Other types are supported by pilot projects.
-
-
- Now let's see how we should change the WWW reference mechanism.
-
- Here's what we've got currently:
-
- <!ELEMENT A - - (#PCDATA)>
- <!ATTLIST A
- NAME ID #IMPLIED
- HREF CDATA #IMPLIED
- TYPE CDATA #IMPLIED
- >
-
- What's the TYPE used for? It's not a data type. There's some
- code in LineMode to handle it, but I'm not sure what it does.
-
- The NAME identifies the anchor as the target of some other anchor.
- We should have NAME (or ID) attributes on pretty much all elements,
- for example:
-
- <DL>
- <DT ID=term>term<DD>definition
- </DL>
-
- The HREF attribute is enough information to retrieve and Entity.
- Good. But it's got thie #anchor stuck on the end. That should
- be a separate attribute. It should be an IDREF, so that we
- can validate that it references an existing ID with an SGML
- parser.
-
- "But," you say, "what if it references an ID outside the current document?"
-
- I suggest we treat a group of nodes that reference each other not
- as separate documents, but as entities of one big document. That
- way, an author can validate the internal links in his/her web.
-
- I suggest two new elements: XREF, for intra-document links (i.e.
- links within the local web), and SEE for inter-document links
- (i.e. links that go outside the local web).
-
- <!ELEMENT XREF - - (#PCDATA)
- -- This element is for links within an HTML document. (a document
- is a collection of entities, or a web of nodes).
- -->
- <!ATTLIST XREF
- CONTEXT CDATA #IMPLIED -- entity containing the XREF is implied --
- -- SGML purists would make this attribute an ENTITY reference,
- and put the URL in the SYSTEM identifier in the prologue.
- For expediency, we put the URL right in the attribute.
- --
- ORIGIN CDATA #IMPLIED
- -- another URL, used as an identifier, rather than a locator.
- Ala the WAIS original-server,database,local-id triple.
- --
- REF IDREF #REQUIRED -- ID of referent element --
- >
-
- <!ELEMENT SEE - - (#PCDATA)
- -- This element is for links from an HTML document to any entity
- in the global web. The location and content-type of the entity
- are sufficient to resolve the reference.
-
- The other attributes could be specified in the text of the
- SEE content, but by making them attributes, the client software
- can process them, for example, to display a table of references
- sorted by date.
- -->
- <!ATTLIST SEE
- LOCATION CDATA #REQUIRED -- URL of referent entity --
- CONTENT-TYPE CDATA #REQUIRED -- MIME Content-Type for the entity --
- CHUNK CDATA #IMPLIED
- -- This is the analogue of the #anchor mechanism.
- If CONTEXT is an SGML entity, this would be an ID,
- though it won't be validated.
- However, if CONTEXT is a text file, this could be a line number.
- The meaning is defined by the content-type.
- --
- ORIGIN CDATA #IMPLIED
- FROM CDATA #IMPLIED -- email address or name of author/provider --
- DATE NUMBER #IMPLIED -- in ISO format: YYYYMMDDHHMMSSZ --
- BYTES NUMBER #IMPLIED -- useful in many cases --
- MD5 CDATA #IMPLIED -- data signature --
- >
-
- What do you think?
-
- Dan
-
-
-